DeepSeek: What Happened, What Matters,  and Why It’s Interesting

Update: 2025-01-28

Description

First:

- Apologies for the audio! We had a production error…

What’s new:

- DeepSeek has created breakthroughs in both: How AI systems are trained (making it much more affordable) and how they run in real-world use (making them faster and more efficient)

Details

- FP8 Training: Working With Less Precise Numbers

- Traditional AI training requires extremely precise numbers

- DeepSeek found you can use less precise numbers (like rounding $10.857643 to $10.86)

- Cut memory and computation needs significantly with minimal impact

- Like teaching someone math using rounded numbers instead of carrying every decimal place

- Learning from Other AIs (Distillation)

- Traditional approach: AI learns everything from scratch by studying massive amounts of data

- DeepSeek's approach: Use existing AI models as teachers

- Like having experienced programmers mentor new developers:

- Trial & Error Learning (for their R1 model)

- Started with some basic "tutoring" from advanced models

- Then let it practice solving problems on its own

- When it found good solutions, these were fed back into training

- Led to "Aha moments" where R1 discovered better ways to solve problems

- Finally, polished its ability to explain its thinking clearly to humans

- Smart Team Management (Mixture of Experts)

- Instead of one massive system that does everything, built a team of specialists

- Like running a software company with:

- 256 specialists who focus on different areas

- 1 generalist who helps with everything

- Smart project manager who assigns work efficiently

- For each task, only need 8 specialists plus the generalist

- More efficient than having everyone work on everything

- Efficient Memory Management (Multi-head Latent Attention)

- Traditional AI is like keeping complete transcripts of every conversation

- DeepSeek's approach is like taking smart meeting minutes

- Captures key information in compressed format

- Similar to how JPEG compresses images

- Looking Ahead (Multi-Token Prediction)

- Traditional AI reads one word at a time

- DeepSeek looks ahead and predicts two words at once

- Like a skilled reader who can read ahead while maintaining comprehension

Why This Matters

- Cost Revolution: Training costs of $5.6M (vs hundreds of millions) suggests a future where AI development isn't limited to tech giants.

- Working Around Constraints: Shows how limitations can drive innovation—DeepSeek achieved state-of-the-art results without access to the most powerful chips (at least that’s the best conclusion at the moment).

What’s Interesting

- Efficiency vs Power: Challenges the assumption that advancing AI requires ever-increasing computing power - sometimes smarter engineering beats raw force.

- Self-Teaching AI: R1's ability to develop reasoning capabilities through pure reinforcement learning suggests AIs can discover problem-solving methods on their own.

- AI Teaching AI: The success of distillation shows how knowledge can be transferred between AI models, potentially leading to compounding improvements over time.

- IP for Free: If DeepSeek can be such a fast follower through distillation, what’s the advantage of OpenAI, Google, or another company to release a novel model?

Comments

In Channel

Tess Posner: AI, Creativity, and Education

2025-11-0951:15

Eric Schwitzgebel: The Weirdness of the World

2025-10-1750:16

John Pasmore: Inclusive AI

2025-10-1134:31

De Kai: Raising AI

2025-09-2154:57

Adam Cutler: AI, Design, and the Human Future

2025-09-0643:27

Joscha Bach at the Artificiality Summit 2024

2025-08-2326:00

Christine Rosen: The Extinction of Experience

2025-08-2155:26

Beth Rudden: AI, Trust, and Bast AI

2025-08-1636:34

Steve Sloman: Information to Bits at the Artificiality Summit 2024

2025-08-0334:59

Jamer Hunt on the Power of Scale

2025-07-2742:02

Avriel Epps: Teaching Kids About AI Bias

2025-07-1250:51

Benjamin Bratton: The Platypus and the Planetary

2025-06-0701:04:29

David Wolpert: The Thermodynamics of Meaning

2025-04-0501:16:19

Blaise Agüera y Arcas and Michael Levin: The Computational Foundations of Life and Intelligence

2025-03-1201:10:14

Maggie Jackson: Embracing Uncertainty

2025-03-0701:00:08

Greg Epstein: Tech Agnostic

2025-03-0658:40

Chris Messina: Reimagining AI

2025-02-2853:20

D. Graham Burnett: Attention and much more...

2025-02-2701:12:25

Michael Levin—The Future of Intelligence: Synthbiosis

2025-02-0501:18:01

Artificiality Keynote at the Imagining Summit 2024

2025-01-2814:46

00:00

DeepSeek: What Happened, What Matters,  and Why It’s Interesting

#box-pro-ellipsis-176275720066165{-webkit-line-clamp:2;}DeepSeek: What Happened, What Matters, and Why It’s Interesting

DeepSeek: What Happened, What Matters, and Why It’s Interesting

Helen and Dave Edwards

DeepSeek: What Happened, What Matters,  and Why It’s Interesting

DeepSeek: What Happened, What Matters,  and Why It’s Interesting